{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Re (temp)\n",
    "### Code\n",
    "#### About r'\\b(\\w+)(\\s+\\1){1,}\\b'\n",
    "* r'...': '防止字符转义\n",
    "* (): 被括起来的表达式将作为分组，从表达式左边开始没遇到一个分组的左括号\"(\"，编号+1.分组表达式作为一个整体，可以后接数量词\n",
    "* \\1: \"\\1\"是对第一个分组的引用\n",
    "* {1,}: 至少匹配到1个\n",
    "* \\b: 匹配一个单词边界，即字与空格间的位置\n",
    "\n",
    "#### About compile()\n",
    "> 编译正则表达式模式，返回一个对象。可以把常用的正则表达式编译成正则表达式对象，方便后续调用(.search(), .match(), .findall()等)及提高效率"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import re\n",
    "string = 'I am am a teacher'\n",
    "pattern_1 = re.compile(r'\\b(\\w+)(\\s+\\1){1,}\\b')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### About search(string)\n",
    "> 对整个字符串string进行搜索匹配，返回第一个匹配的字符串的 match 对象\n",
    "\n",
    "#### About sub(pattern, repl, string)\n",
    "> sub是substitute的所写，表示替换\n",
    "##### 三个必要参数\n",
    "* pattern：表示正则中的模式字符串;但如果是模式字符串对象来调用sub则不需该参数\n",
    "* repl：replacement->需被替换成的子字符串\n",
    "* string: 要被进行替换操作的完整字符串\n",
    "\n",
    "\n",
    "#### About .group()\n",
    "> 获得对应分组的内容,group(0)为匹配到的完整字符串\n",
    "\n",
    "\n",
    "#### 综合\n",
    "```json\n",
    "pattern_1.sub(result_1.group(1),string)\n",
    "\n",
    "{\n",
    "    \"pattern_1\":\"正则表达式对象\",\n",
    "    \"result_1\":\"匹配到的match对象,包含分组信息\",\n",
    "    \"result_1.group(1)\":\"在该match对象里获取组号为1的内容\",\n",
    "    \"x\":\"将string按pattern_1的模式进行匹配,并且替换为result_1.group(1)得来的子字符串内容\"\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "I am a teacher\n"
     ]
    }
   ],
   "source": [
    "result_1 = pattern_1.search(string)\n",
    "x1 = pattern_1.sub(result_1.group(1),string)\n",
    "print(x1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "am am ; am ;  am\n"
     ]
    }
   ],
   "source": [
    "print(result_1.group(0),';',result_1.group(1),\";\",result_1.group(2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### About ?P\n",
    "* (?P &lt;name>): 分组，除了原有的编号外再指定一个额外的别名\n",
    "* (?P=name): 引用别名为\\<name>的分组匹配到字符串\n",
    "\n",
    "> 类似于上述'\\1', 原来是用数字, 现在可以取名字给特定组;取名后数字仍可引用"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "pattern_2 = re.compile(r'(?P<f>\\b\\w+\\b)\\s(?P=f)')\n",
    "result_2 = pattern_2.search(string)\n",
    "x2 = string.replace(result_2.group(0), result_2.group(1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "am am ; am ; am ;  am\n"
     ]
    }
   ],
   "source": [
    "print(result_2.group(0),';',result_2.group(1),';',result_2.group('f'),\";\",result_1.group(2))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
